[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1 by zhouyuan · Pull Request #11275 · apache/gluten

zhouyuan · 2025-12-10T13:21:10Z

What changes are proposed in this pull request?

Fix GPU build by

switch to use gcc-14
bumping to cuda-toolkit-13.1

The new cuda-toolkit-13.1 requires larger disk spaces, so this patch also modified GHA to clean up the disk space firstly

How was this patch tested?

pass GHA

fixes: #11302

Signed-off-by: Yuan <yuanzhou@apache.org>

bdice · 2025-12-10T22:51:52Z

cpp/velox/CMakeLists.txt

            ${VELOX_BUILD_PATH}/_deps/nvtx3-src/c/include
            ${VELOX_BUILD_PATH}/_deps/nvcomp_proprietary_binary-src/include
            ${VELOX_BUILD_PATH}/_deps/rapids_logger-src/include
+            /usr/local/cuda/include/cccl


If possible, we should try to fix this by calling find_package for cudf. It should set up all these include paths. I'll try to repro this locally with @karthikeyann and make a suggestion.

We don't want to require a specific CUDA version just to get a particular CCCL version -- those don't always move in lockstep and sometimes RAPIDS requires CCCL versions that have been publicly released but are not yet shipped in a CUDA toolkit.

@bdice thanks for the inputs. Yes, I think we ran into version mismatch for rapids rmm and cuda. Initially in Gluten we prepared a docker env with cuda-12.8 pre-installed, everything works and the CMake piece was targeting for the old env. However with the recent cudf-25.12 change, It does not compile, hence I'm experimenting on how to fix this. In my local env, i will need to bump to use cuda-13.1 otherwise there will be issues on some header definition. I also tried cuda-12.9 and cuda-13.0 - does not work

/__w/incubator-gluten/incubator-gluten/dev/../ep/build-velox/build/velox_ep/_build/release/_deps/rmm-src/cpp/include/rmm/detail/cuda_memory_resource.hpp:23:49: error: 'synchronous_resource_with' is not a member of 'cuda::mr' 23 | inline constexpr bool resource_with = cuda::mr::synchronous_resource_with<Resource, Properties...>;

I filed #11407 as a follow-up!

Signed-off-by: Yuan <yuanzhou@apache.org>

This reverts commit 648e2f2.

Signed-off-by: Yuan <yuanzhou@apache.org>

This reverts commit 85a2064.

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan · 2025-12-18T06:29:16Z

ep/build-velox/src/build-velox.sh

    echo "enable GPU support."
-    COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=70 \
-        -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc"
+    COMPILE_OPTION="$COMPILE_OPTION -DVELOX_ENABLE_GPU=ON -DVELOX_ENABLE_CUDF=ON -DCMAKE_CUDA_ARCHITECTURES=75 \


for cuda-13.1 it supports 75 at minimal

rui-mo

Thanks!

fix gpu build by bumping to cuda-13.1

0be69c8

Signed-off-by: Yuan <yuanzhou@apache.org>

github-actions bot added VELOX INFRA labels Dec 10, 2025

bdice reviewed Dec 10, 2025

View reviewed changes

zhouyuan force-pushed the wip_fix_gpu_build branch from 4129e03 to 2ef5147 Compare December 11, 2025 05:26

fix

948e987

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan force-pushed the wip_fix_gpu_build branch from 2ef5147 to 948e987 Compare December 11, 2025 11:07

zhouyuan added 6 commits December 15, 2025 02:50

fix

648e2f2

Signed-off-by: Yuan <yuanzhou@apache.org>

fix

f851a89

Signed-off-by: Yuan <yuanzhou@apache.org>

fix

384f7c1

Signed-off-by: Yuan <yuanzhou@apache.org>

Revert "fix"

e2a759d

This reverts commit 648e2f2.

fix cache

4a9dd17

Signed-off-by: Yuan <yuanzhou@apache.org>

fix

731609b

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan force-pushed the wip_fix_gpu_build branch from 55caa57 to 731609b Compare December 15, 2025 09:37

github-actions bot added the BUILD label Dec 15, 2025

zhouyuan added 4 commits December 15, 2025 10:39

test

85a2064

Signed-off-by: Yuan <yuanzhou@apache.org>

Revert "test"

35addab

This reverts commit 85a2064.

fix cuda path

d60e779

Signed-off-by: Yuan <yuanzhou@apache.org>

fix

faeb163

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan force-pushed the wip_fix_gpu_build branch from 0607115 to faeb163 Compare December 15, 2025 13:05

zhouyuan added 3 commits December 15, 2025 16:28

fix cuda version in path

5b91e79

Signed-off-by: Yuan <yuanzhou@apache.org>

fix cache

cd21863

Signed-off-by: Yuan <yuanzhou@apache.org>

fix gpu docker image

612d6a9

Signed-off-by: Yuan <yuanzhou@apache.org>

zhouyuan changed the title ~~[VL] Fix gpu build by bumping to cuda-13.1~~ [GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1 Dec 16, 2025

zhouyuan marked this pull request as ready for review December 16, 2025 08:16

zhouyuan requested a review from jinchengchenghh December 16, 2025 09:38

zhouyuan commented Dec 18, 2025

View reviewed changes

rui-mo approved these changes Dec 18, 2025

View reviewed changes

zhouyuan merged commit 4e35abc into apache:main Dec 18, 2025
114 of 117 checks passed

bdice mentioned this pull request Jan 13, 2026

build(cudf): Simplify cuDF build configuration #11407

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1#11275

[GLUTEN-11302][VL] Fix gpu build by bumping to cuda-13.1#11275
zhouyuan merged 15 commits intoapache:mainfrom
zhouyuan:wip_fix_gpu_build

zhouyuan commented Dec 10, 2025 •

edited

Loading

Uh oh!

bdice Dec 10, 2025 •

edited

Loading

Uh oh!

zhouyuan Dec 11, 2025

Uh oh!

bdice Jan 13, 2026

Uh oh!

zhouyuan Dec 18, 2025

Uh oh!

rui-mo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhouyuan commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Uh oh!

bdice Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhouyuan Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bdice Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhouyuan Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

rui-mo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhouyuan commented Dec 10, 2025 •

edited

Loading

bdice Dec 10, 2025 •

edited

Loading